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Abstract —Multilayer networks are a useful data structure for 
simultaneously capturing multiple types of relationships between 
a set of nodes. In such networks, each relational definition 
gives rise to a layer. While each layer provides its own set of 
information, community structure across layers can be collectively 
utilized to discover and quantify underlying relational patterns 
between nodes. To concisely extract information from a multilayer 
network, we propose to identify and combine sets of layers 
with meaningful similarities in community structure. In this 
paper, we describe the “strata multilayer stochastic block model” 
(sMLSBM), a probabilistic model for multilayer community 
structure. The central extension of the model is that there exist 
groups of layers, called “strata”, which are defined such that all 
layers in a given stratum have community structure described 
by a common stochastic block model (SBM). That is, layers in 
a stratum exhibit similar node-to-community assignments and 
SBM probability parameters. Fitting the sMLSBM to a multilayer 
network provides a joint clustering that yields node-to-community 
and layer-to-stratum assignments, which cooperatively aid one 
another during inference. We describe an algorithm for sep¬ 
arating layers into their appropriate strata and an inference 
technique for estimating the SBM parameters for each stratum. 
We demonstrate our method using synthetic networks and a 
multilayer network inferred from data collected in the Human 
Microbiome Project. 

Keywords—Stochastic Block Models , Clustering, Multilayer 
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I. Introduction 

Modeling relational information between a set of entities 
can often be successfully achieved through a network repre¬ 
sentation. Here, entities correspond to nodes and edges reflect 
some connection between them. In many applications, there 
are multiple ways to define an edge that can be collectively 
analyzed for a more thorough understanding of the data. Mul¬ 
tilayer networks provide a framework to do this, in that each 
relational definition leads to a new layer in the network (T), 
E). Such data and corresponding networks have shown to be 
useful in many contexts, such as, in the comparison of genetic 
and protein-protein interactions in a cell 0, in understanding 
underlying relationships and community structure across social 
networks [4), and in the analysis of temporal networks 0. 
Thus, given the inherent mulitiplexity of network data across 
fields, there exists a need for the development of appropriate 
tools that can leverage information from all layers to elucidate 
structural patterns. 

Each layer in a multilayer network provides its own infor¬ 
mation about interactions between nodes, and it is useful to 


ask whether sets of layers are providing redundant informa¬ 
tion. Addressing this question requires the development of an 
approach to compress networks into a reduced-layer represen¬ 
tation such that it effectively retains the information from the 
original multilayer network. Aggregating layers can potentially 
result in a loss of information, but it can also successfully 
corroborate the existence of underlying structural patterns. 
Moreover, this can lead to improved identification of structural 
patterns, including enhanced community detection (6). This 
idea of reducibility in multilayer networks was previously 
explored in 0: using an information-theoretic notion of dis¬ 
tance between pairs of network layers, the authors performed 
hierarchical clustering of layers and chose the partition that 
maximized a quality function reflecting information loss due 
to the aggregation of layers. While this approach reflects the 
validity and usefulness of combining layers, it does not result 
in a generative model describing the clusters of redundant 
layers. To further this intuition to a probabilistic framework, we 
have developed the “strata multilayer stochastic block model” 
(sMLSBM), which seeks to compress the multilayer network 
by agglomerating sets of layers into structurally similar groups 
that we refer to as “strata,” while simultaneously clustering 
nodes into communities. To address this joint clustering ques¬ 
tion, our model assumes that network layers in a given stratum 
have the same underlying generative model for community 
structure. Importantly, because layers in a given stratum can 
be regarded as samples from a single probabilistic model, 
this can lead to improved community detection and parameter 
inference. 


A. Network Comparison Based on Community Structure 

The problem of aggregating layers in a multilayer network 
is closely related to the problem of clustering networks. That 
is, given an ensemble of networks, one aims to identify sets 
such that networks within a set have similar characteristics. 
These characteristics, or “features” in this context, can de¬ 
scribe any of the following: micro-scale structural properties 
such as subgraph motifs [si, m ; multiscale properties such 
as community structure ED, the spectra of network-related 
matrices ED and by defining latent roles E2- Although 
clustering layers in a multilayer network is closely related to 
clustering networks in an ensemble, these are distinct problems 
with different difficulties and nuances. We focus on the prior 
pursuit; however, we expect for certain network ensembles that 
it will be beneficial to modify and apply our methods to the 
clustering of networks. 


In this work, we analyze and compare layers in a multi¬ 
layer network based on their community structure. Commu¬ 
nity detection in single-layer networks is an essential tool 
for understanding the organization and functional relatedness 
between nodes in a network 03, 03. Although there are 
many definitions for what constitutes a “community” (T5) . one 
often assumes an “assortative community” in which there is a 
prevalence of edges between nodes in the same community as 
compared to the amount of edges connecting these nodes to the 
remaining network. In seeking to identify such communities, 
numerous approaches have been proposed, including those 
based on maximizing a modularity measure JT6l and fitting 
a generative probabilistic model [ 17|. Because each of these 
approaches present computational challenges for efficiently de¬ 
tecting communities, numerous heuristics exist for developing 
practical algorithms OH], 03], (T9) . (20) . (ST). 

In seeking a statistically-grounded approach for studying 
communities in multilayer networks, we consider the stochastic 
block model (SBM) (22), a popular generative model for 
community structure in networks. The assumption of the 
SBM is that nodes in a particular community are related 
to nodes within and between communities in the same way, 
thus allowing SBMs to describe several types of communities 
(e.g., assortative, disassortative, core-periphery, etc. Ifl5l . (23)). 
There are many other appealing aspects of stochastic block 
models; for example, a model-based approach allows for the 
denoising of networks through the removal of false edges and 
the addition of missing edges (13 . (23. The inference proce¬ 
dure for fitting SBMs to an undirected network with N nodes 
and K communities involves learning the two parameters, 7r 
and Z. Parameter 7r is a K x K symmetric matrix, where 7r mn 
gives the probability of an edge existing between a given node 
in community m and another node in community n. Matrix Z 
is an N x K indicator matrix, wherein each binary entry Zi m 
indicates whether or not node i is in community m. Each row 
of Z is constrained such that J2m=i ^im = 1, he. each node 
only belongs to 1 community. We also define vector z, which 
has entries z$ = argrnax m {Z im } that indicate the community 
to which node i belongs. For a given network, these parameters 
are often inferred through a maximum likelihood approach, and 
once learned, they provide information about the within and 
between community relatedness. 

B. Related Work on Multilayer SBMs 

Due to the ubiquity of network data with multiple network 
layers, community detection in multilayer networks constitutes 
an important body of research. Important directions include 
generalizing the modularity measure na and studying dynam¬ 
ics (25) for this more general setting. Given the usefulness of 
SBMs for the understanding of node organization in single¬ 
layer networks, it is important to extend SBMs to the mul¬ 
tilayer framework, and indeed this direction of research is 
receiving growing attention 0, El, |0, (28), 0. In this 
context, the general assumption is that there are shared patterns 
in community structure across the layers of a multilayer 
network, and the goal is to define and identify a stochastic 
block model that captures this structure. These works have 
explored many types of applications that can arise involving 
multilayer networks, and have therefore given rise to several 
complementary models for multilayer stochastic block models 


(MLSBMs). We now briefly summarize this previous work that 
is very related, but notably different, from the model we study 
herein. 

In Refs. 0, (26), (23, the authors studied situations 
in which many layers follow from a single SBM. In these 
instances, it is possible to obtain improved inference of the 
SBM parameters by incorporating multiple samples from a 
single model. For example, in Ref. 0 the authors considered 
an increasing number of layers, L, and explored asymptotic 
properties of the estimated SBM parameters. Specifically, they 
fit an SBM to each individual layer in a way that utilizes 
the information from all layers, and they showed convergence 
of these estimators to their true values as L —» oo. For a 
network with L layers and K communities in each layer, their 
approach requires an estimate of the community assignment 
matrix Z l and probability matrix tv 1 for each layer l, the 
latter of which involves learning K[K + l)L/2 parameters. 
To this end, the authors extended the variational approximation 
for approximating the maximum likelihood estimates of SBM 
parameters introduced in single-layer SBMs introduced in 1 30 1 
to the multilayer setting. 

Ref. 0 was followed up by Ref. (26) . wherein the authors 
addressed issues that can arise for the model when K and/or 
L is large, or if the network is sparse. They proposed a 
modified model called the restricted multilayer stochastic block 
model (rMFSBM). In this model, instead of learning a set 
of L independent parameters, 7r^ n , for each pair, (m, n), 
each entry in 7r is fully layer-dependent so as to produce a 
reduction in the number of free parameters. Specifically, to 
determine the probability of an edge between a node from 
community m and a node from community n in layer Z, 
they use a logistic link function and model the probability 
as logit (7r^ n ) = 7r mn + /?/. The /?/ is an offset parameter 
representing the particular layer or type of edge. In this model, 
it is necessary to learn K(K + l)/2 + L total parameters. 
Thus, the maximum likelihood estimate for an rMFSBM is a 
regularized estimator. 

Consistent with the theme of fitting a single block model to 
a collection of layers, Ref. 123 is similar to Refs. 0 and (26) 
in that the authors seek to leverage information from all layers 
by considering the joint distribution of layers. Using this, they 
estimated quantities such as the marginal probabilities of node 
assignments to communities and the edge probabilities within 
and between groups. An interesting aspect of their approach is 
that they introduce a covariate capturing the coupling between 
pairs of nodes. For a network with K communities and L 
layers, this requires the estimation of (2 L — 1 )K 2 + (K — 1) 
parameters. 

We summarize Refs. (28) and (29) , which provide tech¬ 
niques to determine whether a single layer network is the 
result of an aggregation procedure in a multilayer network. In 
Ref. (28) , the authors defined a version of multilayer stochastic 
block model and an inference procedure for assessing whether 
or not a single-layer network was actually obtained from an 
aggregation of layers in a multilayer network; they consid¬ 
ered the aggregation of layers using boolean rules. Ref. [29] 
describes two possible generative processes for multilayer 
networks: the edge-covariate and independent-layer models. 
In the edge-covariate model, an aggregated network is defined 
in which a given edge (z, j) only appears in a single layer. 
Aggregating the layers in a multilayer network into a single 
network representation combines all of the edges from each of 


the layers. Thus, the translation of this idea into a generative 
model involves choosing a layer membership for each edge 
and sampling edges with a probability conditioned on adjacent 
nodes. In the independent-layer model, layers are generated 
independently from each other and the only constraint is that 
group membership of the nodes are the same across all layers. 

C. Contributions 

While the literature on MLSBMs has recently grown 
quickly, there is still a need for a probabilistic generative 
model that allows for the layers in a multilayer network to 
be described by multiple SBMs. To this end, we developed 
a novel multilayer stochastic block model, sMLSBM, that 
assigns network layers into disjoint sets that we call strata, 
where a collection of layers in a given stratum are assumed 
to be samples from the same underlying generative model. 
Our method can be viewed as a joint clustering procedure, 
where we seek to group layers into strata and nodes into 
communities. That is, we seek to simultaneously find layer-to- 
strata and node-to-community assignments. In order to address 
practical applications that can involve multilayer networks with 
several strata, layers, communities and nodes, we introduce an 
algorithm that effectively partitions layers into strata and an 
inference procedure to learn the SBM parameters for each 
stratum. Importantly, these two steps—assigning nodes to 
communities and layers to strata—are combined in an iterative 
algorithm so that an improvement in community detection 
can lead to an improvement in the clustering of layers into 
strata, which can iteratively lead to further improvement in 
community detection, and so on. 

To describe the model, the algorithm for fitting the model, 
and its performance, the remainder of this paper is organized 
as follows. In Sec. [II| w e define the model and an algorithm 
for fitting it. In Sec. |III[ we perform numerical experiments on 
synthetic networks. In Sec.|IV| we test the model on correlation 
networks constructed from data from the Human Microbiome 
Project. 

II. sMLSBM: Strata Multilayer Stochastic 
Block Model 

A. Network Definition 

Let G(N,£) define a single network with N nodes and a 
set of undirected edges, £ = {(i, j)}. We define a multiplex 
network, which is one kind of multilayer network m, m, 
by defining a set of network layers, G l (N ,£ l ), where l E C 
and the set C = {1,2,-•• , L} indicates the layers’ indices. 
We denote the collection of L layers as a set, Q, such that 
Q = {G 1 , G 2 , • • • , G L } makes up the multiplex network and 
each element of the set is the network representing a layer. 
Furthermore, we define A = {A^A 2 ,--- ,A L } to be the 
corresponding adjacency matrix representations of the network 
layers in Q. 

B. Model Definition 

Under the sMLSBM, the network layers, G l (N,£ l ) are 
assumed to be generated by a set of S stochastic block 
models, where the layers in stratum s E {1,2, ••• , S}, are 
parameterized by 7V S and Z s (or equivalently, vector which 
has entries zf = argrnax m {Zf m } ). Note that the parameters 



Fig. 1. Objective of strata multilayer stochastic block model (sMLSBM). 

Each of the L = 9 networks here represents a layer in a multilayer network. 
Every network layer has IV = 36 nodes that are consistent across all layers. 
There are S = 3 strata as indicated by the three rows and the colors of 
nodes. Clearly, network layers within a stratum exhibit strong similarities in 
community structure. That is, although each layer follows an SBM with K = 
3 communities, the SBM parameters are identical for layers within a strata 
but differ between layers in different strata. We would like to partition the 
layers into their appropriate strata and learn their associated SBM parameters, 
7r s and Z s . 


7r s and Z s for a single stratum are analogous in meaning to 
their respective parameters in the single-layer SBM case (see 
Sec. For each stratum s, we let C s C C denote the set of 
layers corresponding to s, so that C = (J s C s and 0 = C s D C l 
for all s,t E {1,...,5}, s t. We let L s = \C S \ denote 
the number of layers in strata s so that J2 S L s = L. Finally, 
we allow the number of communities, K s , to vary across the 
strata. 

For a given multilayer network, our objective during in¬ 
ference is to identify the stratum assignment of each layer 
and to learn the collection of strata parameters, II = 
{7T 1 ,7r 2 ,..., 7r 5 } and Z = {Z 1 , Z 2 ,... Z s }. The learned 
SBM parameters for a stratum represent a consensus for the 
associated layers, and so in that sense can be interpreted as 
reducing the effective number of layers 0. However, strata 
can also be interpreted as a way to simply identify layers 
with similarities in community structure. Figure 1 shows a toy 
example of a multilayer network with S = 3 strata, where 
each layer has N = 36 nodes and K = 3 communities. 
Each individual network in this figure represents a layer in the 
network. The nodes in the layers belonging to each stratum are 
colored according to their stratum membership; moreover, it 
is easy to see that layers of a stratum exhibit high similarities 
in community structure. 

As part of our procedure, we specify another parameter that 
we refer to as the adjacency probability matrix, 0 s , which can 
be computed from 7v s and Z s . Specifically, 0 s is an N x N 
matrix such that 6 s - gives the probability of an edge between 
nodes i and j in stratum s. That is, Of- = n ® SyS , where 

J Z J z i Zj 

zf specifies the community number for node i in stratum s. 












Finally, we define the matrix Y of size LxS, wherein an entry 
Y is is a binary indicator of whether or not layer l is assigned 
to stratum s. Note that Y[ s = 1. We also define a vector 
y , which has entries yi = argmax s {Y/ s } to indicate the strata 
to which layer l belongs. 


Phase I 


Fit SBM to each 
network layer 

_„ 

Compute 

0(z,7r) 

/c-means cluster all 

individually 


for each layer 

into S strata 


C. Inference for sMLSBM 

The procedure for fitting an sMLSBM to a given network 
requires finding the layer-to-strata memberships and node- 
to-community memberships that best describe the multilayer 
network. For notational convenience, we introduce hat notation 
to represent the learned parameter estimate from the inference 
procedure. We can write down the marginal likelihood for the 
collection of network layers, Q, as, 

p(£|n) = ^5>(s,iJ,Y|n). (i) 

Z Y 

We assume the probability of an edge between two nodes in 
layer l belonging to stratum s can be modeled as a Bernoulli 
random variable, based on the community membership of the 
nodes. In particular, p(A\- = 1) ~ Bernoulli (7r|, 0 ). 

Since Y and Z are both latent quantities, searching over 
all possible values quickly becomes intractable. To tackle this 
issue, we develop a two-phase algorithm that incorporates a 
clustering algorithm for choosing the best Y. This greedy ap¬ 
proach leads to a significant reduction for the size of the search 
space since only Z must be statistically inferred. Specifically, 
during Phase I, we infer an SBM for each layer in isolation, 
and we cluster together sets of layers that have similar SBM 
parameters. Using these results as an initial condition in Phase 
II, we develop an iterative method that jointly identifies layer- 
to-stratum and node-to-community assignments as well as the 
SBM parameters for each stratum. We provide a schematic of 
the algorithm in Fig. [2j and below we present the two-phase 
algorithm in detail. 


Phase I. Phase I is comprised of two parts. First, we fit an SBM 
to each individual layer l G {1,..., L}, which yields inferred 
SBM parameters iv l and node-to-community memberships 
Z l . Then we cluster the layers based on the similarities of 
7v l and Z l . To infer 7r l and Z l , we use the the inference 
method described in l30l . Here, the authors used a variational 
inference technique to approximate the maximum likelihood 
estimates for the stochastic block model parameters. For the 
set of L layers, this produces sets of SBM parameters for 
each layer, which we denote by II = {7T 1 ,7f 2 ,..., 7r L } and 
Z = {Z 1 , Z 2 ,... Z L } (that is, at this stage of the procedure, 
each layer is temporarily treated as its own stratum). Note 

also that each Z can be equi valently represented by vector 
z l , as described in Sec. 


I-A 


Using the estimates 7v l and 
Z L for a given layer, /, we can construct the corresponding 
adjacency probability matrix, 0 , which is defined entry-wise 

by 0^ = 7 \ l £ £ .. Doing this for each layer results in a collection 
3 . a1 ^ 2 L 

of adjacency probability matrices, ® = {0,0,--,0 }. 

Now, we seek an initial partition of layers into strata based 
on ®. The goal is to identify S sets C s so that the matrices 
{0 } with l G C s are close to one another, but they are 
distant from the remaining matrices, {0 } with l G C\ C s . 
This is accomplished by treating each 0 as a feature vector 


Phase II 



Compute Compute 


Iterate until 

Cluster layers into 
strata under 

l — 

0 ( 1 ) and 0 ( 2 ) 

1 

V. 

1 

Update strata for unique 
combinations of 

0(1) and 0(2) 


Fig. 2. Schematic illustration of our algorithm: Our algorithm for fitting 
an sMLSBM is broken up into two phases: an initialization phase to cluster 
layers into strata, and an iterative phase that allows recursive learning between 
node-to-community and layer-to-strata assignments. 


and applying k- means clustering with S centers so as to 
identify S strata, C s . Note that S can be selected a priori , 
or approximated with a measure such as the gap statistic 
ED. This gives us an initial estimate Y for Y. Note that 
this procedure initially treats each layer as a separate stratum, 
but provides a principled agglomeration of layers into S < L 
strata. 

Phase II. After a first-pass approach for assigning layers 
to strata, we initialize an iterative phase to more effectively 
estimate layer-to-strata assignments as well as the model 
parameters. Specifically, we would like to find the consensus 
SBM for each strata—that is, the K s x K s matrix 7r s and the 
N x K s matrix Z s that maximize the likelihood of the observed 
layers in each stratum. We let A s = {A^} for l G C s denote 
the collection of adjacency matrices corresponding to the L s 
layers in stratum 8. 

We now proceed to maximize the likelihood in each stra¬ 
tum, by extending the framework of Ref. [30] to a multilayer 
context. Note that this is similar to Ref. id, except that we are 
not aiming to infer an SBM probability matrix for each layer, 
individually. In particular, the complete-data log-likelihood for 
stratum s can be written as, 

p(A s ,Z s )=p(A s \Z s )p(Z s ), (2) 

where 

p ( a ° iz s )=n^ 

leC s i<j mn 

To write p( Z s ), it is helpful to introduce a new parameter a ^ 
that represents the probability that a randomly-selected node 



in stratum s belongs to community m, i.e. a ^ = p(Zf m = 1). updates. Doing so yields the following, where the hat notation 
Note that = 1. Using this parameter, we can write symbolizes the current best estimate for the given parameter: 


^)=nn^ J - (4) 

i m 

It follows that the complete-data log-likelihood for the adja¬ 
cency matrices representing the layers in stratum s can be 
expressed as, 

logP(.4 s , Z s ) = log(P(Z s )) + log(P(^l s I Z■*)) 

= EE^i°g«) 

i m 

+ E E E "4 ij logfPmn) 

le£ s i<j rnn 

+ EE D 1 _ A l?) 

leC s i<j mn 

Problems of this variety that involve the need to com¬ 
pute maximum likelihood estimates with incomplete data are 
typically addressed with the expectation maximization (EM) 
framework (32). Doing so requires the ability to compute 
P( Z s | A s )\ however, Ref. [30] showed that it is intractable 
to calculate the conditional distribution for the single-layer 
network case. To address this challenge, we use a variational 
approximation, analogous to approaches in 0, ED, ED- in 
general, a variational approximation seeks to optimize a lower 
bound on the log-likelihood. To do this, we first approximate 
the conditional distribution, P(Z S \ A s ) « Ra s > where 

R A s(Z s ) = Y[h(Zl,T i .). (6) 

i 

Here, matrix r s contains entries r/ m that approximate the 
probability that node i belongs to community m in stratum s. 
Further, function h(-) represents the multinomial distribution, 
with parameters, {rf m } for m G {1,..., K s }. Using this, we 
define the variational approximation as 


=Vf s 

m / j 1 ii 


JN, 


oc a: 


7F qt ~ 


E 


leo 


E t s t s A 1 

i<j im jn^ij 




. T- T- 

i<j im jn 


leC s i<j n 


(9) 

( 10 ) 

( 11 ) 


To find the best estimates for r s and n s , we alternate between 
updating r s and tt s until convergence. When convergence has 
occurred, we refer to the resulting estimates as the consen¬ 
sus and tP for stratum s. Similarly, Z s represents the 
consensus indicator matrix of node-to-community assignments 
computed from r s . Note that we use the bar notation to reflect 
that the particular parameter estimate is for a stratum, rather 
than for an individual layer. 


Since r s and 7v s are computed in terms of each other, 
we can use one of the consensus parameters to compute the 
other parameter in individual layers. In particular, using the 
fixed node-to-community assignments from r s , we compute 
the maximum-likelihood SBM parameters for a particular layer 
l, which we denote with a tilde and hence, tz 1 and f l . Similarly, 
for fixed 7p, we compute the node-to-community assignments 
t 1 . Such estimates allow us to determine whether or not the 
stratum consensus estimates are accurate estimates for the 
SBMs of individual layers of the stratum. More importantly, 
as we shall now describe, these layer-specific estimates allow 
us to design an iterative algorithm that allows for alternating 
between learning the node-to-community and layer-to-stratum 
assignments. 


To this end, we represent each layer by the adjacency 
probability matrix, which we compute two different ways: 
letting 0(r, 7 t) represent the adjacency probability matrix 
specified by r and 7r, we define 


J(Ra‘) = - KL(^ S (Z S ),P(Z S | A 8 )), (7) 

where U is log likelihood and KL is the Kullback-Leibler 
divergence. 

Through maximizing J(Ra s ), we minimize the KL diver¬ 
gence between the true conditional distribution, P( Z s \ A s ), 
and its approximation, P^ S (Z S ). Moreover, we follow the 
derivation in Ref. (30) and rewrite J(Ra s ) as 

J(Ra s ) = EE T im lo g(«m) 

i m 

+ E EE T ^ r i»[4 lo s(0] 

leC s i<j rnn ^ 

+ E EE^K 1 “ A i^ “ ^mn)] 

leC s i<j mn 

~ E E T im l0 S( r /m)- 

i m 

We can now differentiate J(Ra s ) with respect to each 
parameter—while using Lagrange multipliers to enforce con¬ 
straints (i.e. probabilities summing to 1)—to compute the 



(12) 

= 6 l (f l ,T^) 

(13) 


Note that the first definition uses the strata-consensus estimate 
for r s and a layer-specific estimate for 7r s , whereas the latter 
uses a layer-specific estimate for r s and the strata-consensus 
estimate for n s . 

During Phase I, we identified strata by clustering the adja¬ 
cency probability matrices for the L layers using the k -means 
algorithm. We employ a similar procedure here, but instead 
of clustering L matrices, we now cluster 2 L matrices, since 
each layer is represented in two different ways. Moreover, 
clustering these 2 L matrices yields two cluster assignments 
for each layer. Typically, both representations of a particular 
layer will receive identical cluster assignments—that is, for 
a given /, 0^ and 0| 2 ) are assigned to the same cluster, 
or strata. However, an interesting case arises when the two 
representations induce different stratum assignments for a 
given layer, because this implies that there is disagreement 
between 0^ and 0| 2 ), which implies uncertainty in the strata 
assignment of that particular layer /. Because our iterative 
algorithm requires each layer to be assigned to a single stratum 



(i.e., we do not allow for mixed membership of layers into 
strata), layers with mixed membership according to 0^ and 
0(2) must be dealt with in some way. To account for these 
situations, we define additional strata for each combination of 
membership that arises. For example, if there are several layers 
{/} that are clustered into stratum 1 according to 0^ and 
stratum 2 according to 0( 2 ), then we define a new stratum that 
contains only these layers. We note that there exists a variety 
of options for handling layers with such mixed membership 
after applying k -means clustering to 0^ and 0^ 2 ) (e.g., one 
could assign such a layer to a stratum at random); however, 
we leave open for future work the exploration of these other 
options. 

After a single pass of Phase II, which requires layer-to- 
strata assignments (which can be encoded by vector y) as 
input, the algorithm yields (ideally) improved layer-to-strata 
assignments (as well as consensus estimates for the SBM 
parameters of the strata, r s and 7v s ). Therefore, Phase II 
involves iterating the above procedure until the lay er-to-strata 
assignments do not change. We note that in principle, it is 
possible for new strata to arise in each iteration (i.e., because 
we create strata to avoid mixed membership of layers), and 
this can allow the number of strata to grow with each iteration; 
however, we did not observe this issue in any of our synthetic 
or real data experiments. As we will show in the following 
section, convergence is typically observed after just a few 
iterations (e.g., see, for example, the second row of Fig. 4). If 
such an issue arises, it may be helpful to bound the number 
of iterations in Phase II. 


III. Synthetic Data Experiements 
A. Comparison of sMLSBM to other SBM Approaches 

To demonstrate a situation where the sMLSBM framework 
has a clear advantage over other models, we designed a 
synthetic experiment and compared the results to two differ¬ 
ent SBM approaches: i) fitting a single SBM to all of the 
layers (denoted “single SBM”), and ii). fitting a stochastic 
block model to each layer individually (denoted “single¬ 
layer SBM”). We generated a multilayer network, where each 
layer has N = 128 nodes, K = 4 communities and an 
expected mean degree of c = 20 (i.e., every network layer 
is expected to contain cN / 2 = 1280 undirected edges). We 
specified an sMLSBM with S = 3 strata and 10 layers per 
strata, which resulted in L = 30 total layers. We defined 
7r s for each stratum s in terms of two parameters, p\ n and 
p s out , which give the within-community edge probabilities and 
between-community edge probabilities, respectively. That is, 
we define n s mn = p s in when to = n and ir s mn = p s out when 
m 7 ^ n. It follows that the expected mean degree is given by 
c = N(p? n + (K — 1 )p s out )/K. In our experiment, we select 
the following SBM parameters: ( p} n ,pl ut ) = (0.6,0.0083); 
(P 2 in ,P 2 out) = (0.4,0.075); and (pLkL) = (0-125,0.167). 
In Fig. 3(A), we show an example network layer from each 
strata. Nodes are colored by their community assignments in 
stratum 1. Note that the node-to-community assignments are 
different in each stratum and that the extent of block structure 
decreases from stratum 1 to stratum 3. 

In order to compare the accuracy of fit for the three 
models—single-layer SBM, single SBM and sMLSBM—we 
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Fig. 3. Synthetic experiment comparing sMLSBM to other SBMs. A. We 

specified a model with S = 3 strata and L = 10 layers per stratum. 
A representative layer from each stratum is plotted. Note that nodes in all 
networks are colored according to their community membership in stratum 
1. Each network has N = 128 nodes, K = 4 communities and mean 
degree, c = 20. The p s in parameters for s = 1, 2 and 3 are 0.6, 0.4 and 
0.25, respectively. Corresponding values of p s out were selected to maintain 
the desired expected mean degree, c=20. B. We fit 3 types of models to the 
30 network layers: i) single SBM: fitting a single SBM to all of the layers; ii) 
single-Layer SBM: fitting an individual SBM to each layer; and iii) sMLSBM: 
identifying strata and fitting an SBMs for each strata. Each model yields an 
estimate 7z s i for the true SBM of each layer l, which is denoted 7 t 1 . Here si 
denotes the inferred strata for layer l. On the vertical axis we plot the mean 
1 2 norm error ||vec(7r z ) — vec(7r s ^)|| 2 - C. For each of the three models, we 
computed the normalized mutual information (NMI) between the true node- 
to-community assignments z 1 and the inferred values z s i . 


quantify the inference accuracy of the SBM parameters, 7r^, 
and community assignments, Z Sl . First, for each layer and each 
model, we quantified the error (£ 2 norm) between vec(7r^) and 
its true value, vec(7r z ). Note that vec(X) is the length 

vector representing the lower triangle of the matrix X. More¬ 
over, to quantify error, we compute | |vec(7r z ) — vec(7r s *)112- We 
note that this error is well-defined because we identify K = 4 
communities for all layers and all models. The mean error 
across layers under each model are shown in Fig. 3(B). In this 
example, sMLSBM outperforms the two other models. Second, 
we computed for each layer the mean normalized mutual 
information (NMI) ED between the true node-to-community 
assignments, z l , and the inferred values, z^~, under each model. 
In other words, for each layer, we compute, NMI(z^z^). 
Figure 3(C) shows the mean NMI for community assignments 
across layers. 

B. Synthetic Experiment with Two Strata 

Next, we further explored the performance of our algo¬ 
rithm (see Sec. II-C| ) for inferring an sMLSBM under various 
situations: 1) in comparison to baseline clustering methods; 

2) in response to an increase in the number of layers; and 

3) under variations in levels of detectability. Specifically, 
we designed synthetic experiments in which we generated 
multilayer networks with either L = 10orZ/ = 100 layers. 







Every multilayer network contained S = 2 strata (each having 
K 1 = K 2 = 4 communities), and in each layer there were 
N = 128 nodes (each having an expected mean degree 
of c = 16). Note that in this example both strata have 
the same node-to-community assignments. The strata were 
fixed to be the same size, L 1 = L 2 = L/ 2. Similar to 
the experiment described in Sec. |III-A| the SBM parameters 
were constructed using p s in and p s out . Since we have already 
specified the expected mean degree, these parameters must 
satisfy the constraint c = N(jp\ n +Pout)/^ for both strata. 
In all simulations, we fixed the SBM parameters of the first 
strata as (p} n ,pl ut ) = (.1836, .1055). It is also convenient to 
define the quantity, N(p\ n — pl ut ) = 10, which relates to the 
detectability of communities (34). For example, the ability to 
detect community structure in a given layer and/or strata is, in 
general, expected to improve with increasing N(p s in —p s out ). 
For the second strata, we allow N(p 2 n — p 2 out ) to vary. 

We present results for this experiment in Fig. 4, wherein the 
left and right columns give results for L = 10 and L = 100, 
respectively. Symbols in each plot represent the mean over 
50 multilayer networks, and error bars show standard error. In 
each plot, the vertical dotted line indicates N(p 2 n — p 2 out ) = 10, 
which represents the point where the two strata are indistin- 
guishable since (p} n ,pl ut ) = (p 2 in ,P 2 out )- In Fig. 4(A), we 
show the NMI between the true layer-to-strata assignments 
and those inferred by sMFSBM, or NMI(y, y). As a baseline, 
we compare sMSFBM results to directly clustering the layers’ 
adjacency matrices using the k- means algorithm with K = 2. 
We consistently observe higher NMI as a result of sMFSBM 
compared to k- means. More interestingly is the case with 
L = 100, where both k- means and sMFSBM perform at least 
moderately well at partitioning layers into strata before the 
point where the strata are indistinguishable. In Fig. 4(B), we 
plot the number of iterations (NOI) required for Phase II of 
our algorithm to converge. We observe that as the number 
of layers in the network increases, so does the number of 
required sMFSBM iterations. Moreover, the peaks in panel 
B. correspond to the sudden jumps in strata NMI. 


Finally, in Fig. 4(C) we show the quality of node-to- 
community assignments by plotting the NMI between the true 
and inferre d node-to-community assignments as described in 
Sec. III-A Note that stratum 1 here represents the stratum 
where the majority of layers were generated from model S' 1 
and analogously for stratum 2. Therefore, when the strata 
NMI is low (panel A.), we see poorer community detection 
results than expected, as layers get incorrectly mixed. As the 
strata NMI increases, layers from the same model are assigned 
together and the communities NMI stabilizes. Finally, by 
comparing the results for L = 100 to those for L = 10, 
we observe an increase in number of layers, L , generally 
leads to an improvement in community detection and strata 
identification. 


IV. Correlation Networks from the Human 
Microbiome Project 

As an application of sMFSBM, we consider correlation 
networks constructed from data from the Human Microbiome 
Project (351 . For various sites on the body, the human mi¬ 
crobiome project has successfully collected multiple human 



N (Pin~PLt) N (Pin~PD 


Fig. 4. Synthetic experiment with two strata. We conducted numerical 
experiments with multilayer networks with N = 128 nodes, mean degree 
c = 16, S = 2 strata and K 1 = K 2 = 4 communities. The networks 
contained either L = 10 (left column) or L = 100 layers (right column), 
which were divided equally into the two strata. For stratum 1, we fixed 
the quantity N(jp\ n - pl ut ) = 10, which fully specifies (p} n ,pl ut ) since 
setting c = 16 also constrains these parameters. In contrast, we vary 
N(p 2 n — Pout)- A As a function of N(p 2 n — p^ ut ), we plot the mean 
NMI to interpret the ability of sMLSBM to recover the true layer-to-strata 
assignments. We compare the performance of sMLSBM (purple curve) to 
generic fc-means clustering (green symbols) of adjacency matrices. B. We 
plot the mean nu mber of iterations (NOI) required for Phase II of our 
algorithm(see Sec. |II-Q to converge. C. Finally, we measure the quality of 
node-to-community assignment results by plotting the mean NMI between 
the true node-to-community assignments and those inferred with sMLSBM in 
stratum 1 (red symbols) and stratum 2 (blue symbols). 


samples in order to better understand interactions between 
bacterial species. In this context, network inference is partic¬ 
ularly interesting, as such methods aim to capture the rela¬ 
tionships between various organisms. Microorganisms exhibit 
intricate ecologies within the gut of their human host and 
particular body sites have been shown to possess characteristic 
interactions. Further, certain interactions between microbes 
can often be associated with particular health and disease 
states (36]. Microbiome data is typically collected through 
metagenomic sequencing and reads are further binned into 
groups, known as operational taxonomic units (OTUs), to 
represent particular organisms. The nature of this count-based 
sequencing data makes network inference challenging, and is 
thus an interesting field in itself. To demonstrate the potential 
use for sMFSBM in the context of the human microbiome, 
we applied our algorithm for learning sMFSBMs to multilayer 
networks constructed from the SparCC lf37l network inference 
method. 

SparCC is a correlation network inference method that 
aims to approximate the linear Pearson correlation between 
components in a system. This method performs favorably, as it 
accounts for the extent of diversity in the microbial community, 
which plays a significant role in detecting valid interactions. 
Furthermore, networks are constructed with the assumptions 












































Fig. 5. Hierarchical clustering of SparCC networks. Hierarchical clustering was performed on binary adjacency matrices that were constructed by thresholding 
correlations between operational taxonomic units (OTUs), which give the nodes in our multilayer network. There are L = 18 network layers that correspond to 
18 body sites in the human body, as shown by the leaves in the dendogram. The data was obtained from Ref. E2 Colored boxes around the leaves indicate 
layer-to-assignments according to our fit of an sMLSBM. Although there is agreement between the hierarchical clustering and our inferred strata, we point out 
that there is considerable variability in the hierarchical clustering results as it is not possible to obtain a good partitioning of the layers into the sMLSBM-oriented 
strata by cutting the dendrogram horizontally. In contrast, there is a very clear biological relevancy of the strata inferred by sMLSBM (which moreover provides 
a generative model for the network layers). 


that the number of components in the system (e.g. OTUs) 
is large and that the correlation network should be sparse. 
As supplemental data in Ref. <37 ) , the authors provided their 
inferred microbial interaction networks for 18 sites in the 
human body. The edges in these networks have positive and 
negative real-valued weights, based on the results of SparCC 
inference. In this analysis, we converted the SparCC networks 
into binary adjacency matrices by allowing a link only if the 
SparCC edge-weight between two OTUs was at least 0.2 (as 
given in Ref (37)). To convert the 18 single-layer networks 
corresponding to species interactions in 18 body sites, we 
found the collection of nodes (OTUs) that occurred in at least 
2 of the layers. This resulted in N = 213 unique OTUs (nodes) 
for our multilayer network analysis. 

We inferred an sMLSBM for the multilayer network and 
found S = 6 strata, implying that we find 6 clusters of 
body sites such that the microbiomes are similar between 
sites in the same cluster but differ from microbiomes at sites 
in the remaining clusters. To gauge the performance of our 
method, we compared the results to a hierarchical clustering 
(euclidean distance and complete linkage) of the networks. 
In Fig. 5 we show the dendrogram depicting the hierarchical 
clustering result, wherein the vertical axis denotes distance. 
Also captured in this figure are the sMLSBM results; leaves 
correspond to body sites and the colored boxes indicate strata 
assignments for sMLSBM, which do a very good job of 
identifying biologically meaningful groups. For example, it 
is intuitive that the saliva, hard palate and tongue dorsum 
layers have very similar microbe species interaction networks. 
However, the stratum with K s = 6 layers seems to be a 
miscellaneous cluster. Using the dendrogram to compare the 
sMLSBM results with hierarchical clustering, we see that the 
quality of the clustering partition is highly dependent on where 
the tree is cut. It is difficult to find a cut of the tree that 


partitions the body sites in a way that is as meaningful as the 
result of fitting sMLSBM. Moreover, fitting sMLSBM also 
provides a generative model for each stratum. 

The utility of having a probabilistic generative model for 
the microbiomes is illustrated in Fig. 6, where we illustrate net¬ 
work layers for 4 of the 6 strata that we identify. Specifically, 
each row provides information about the network layers and 
their fitted sMLSBM model for a particular stratum. Each grid 
in the figure represents the binary adjacency matrix encoding 
interactions between OTUs: a colored dot at position (i, j) 
indicates the existence of an edge (i, j) in the corresponding 
network layer. In the first column of each row is a sample 
network generated with the learned SBM parameters of that 
stratum, 7r s and Z s . Columns 2 and 3 show two representative 
network layers within the stratum. Note that while some strata 
have more than two members, for illustrative purposes we 
only show two example layers. It is easy to see the very 
similar block structure between all networks in a given row, 
corroborating the usefulness of the sMLSBM approach. 

V. Conclusion and Future Work 

We developed a novel model for multilayer stochastic block 
models (MLSBMs) and an associated algorithm to jointly par¬ 
tition layers into strata and nodes into communities. Our model 
assumes that layers belonging to a stratum have community 
structure following the same underlying SBM. To fit sMLSBM 
to a multilayer network, and more-specifically, a multiplex 
network, we iteratively alternate between rearranging layer-to- 
strata assignments and updating the model parameters for each 
stratum. Having multiple networks within a stratum—hence 
multiple realizations from some underlying model—helps to 
make inference more accurate. Particularly, more accurate 
assignments of nodes-to-communities within a stratum leads 
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Fig. 6. Visualization of Strata in SparCC Networks. We visualize the 
adjacency matrices for SparCC networks that encode microbiome interactions 
at body sites. In each panel, a colored dot at position (i,j) indicates the 
existence of an edge (i,j) in the corresponding network layer. The four rows 
correspond to four different strata. In column 1, we show a sample network 
generated from the SBM parameters, 7r s and Z s , that we inferred for that 
stratum. In Columns 2 and 3, we show SparCC networks from that particular 
stratum. Note the strong similarity across each row. 


to improved estimation of SBM probability parameters, and 
vice versa. We have shown for multiplex networks with several 
strata (e.g., see Fig. 3) that inaccuracies can arise if one 
attempts to fit a single SBM to the network or study the 
network layers in isolation. In contrast, our model allows 
for an understanding of the similarities between layers in a 
network, in terms of their community structure. The ability to 
identify strata within collections of networks holds promise in 
numerous applications. 

There are several extensions to sMLSBM that could make 
the approach more accurate and applicable to a wider range of 
applications. First, as is typical for SB Ms, it would be useful to 
consider the degree-corrected [38] and overlapping community 
(i.e., mixed-membership) (38) varieties. Next, it may be useful 
to consider mixed membership of layers into strata, as well as 
nodes into communities. Further, sMLSBM as implemented 
here is only appropriate in unweighted, undirected networks. 
Extensions to weighted and directed networks, as shown in 
[391 and (40), could be quite useful. 


Finally, the microbiome example considered here reveals 
some interesting computational biology questions that could 
facilitate the development of more advanced network tools. 
To construct the multilayer network, negative edges were 
thresholded away; however, antagonistic relationships between 
microbes are known to be important [411. Thus, it would be 
useful to develop a signed version of sMLSBM that allows 
edges to be either positive or negative. 

The rise of a greater number of multilayer network datasets 
is providing the need for additional tools for the construction 
and analysis of such networks. The sMLSBM provides a new 
method to find signal in inherently noisy and complex network 
data. 
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